Supplementary Materials to Adaptive and Transparent Cache Bypassing for GPUs

نویسندگان

Ang Li

Gert-Jan van den Braak

Akash Kumar

Henk Corporaal

چکیده

ABSTRACT This document is the supplementary supporting file to the corresponding SC-15 conference paper titled Adaptive and Transparent Cache Bypassing for GPUs. In this document, we first show the experiment figures for the four extra GPU platforms that cannot fit into the original paper due to page limitation. We then show the simulation results for the hardware approach that attempts to reduce bypass overhead. Finally, we analyze the performance patterns of the applications with respect to different bypassing threshold, which may explain why certain applications can benefit significantly from cache bypassing than others.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Cache Bypassing Techniques

With increasing core-count, the cache demand of modern processors has also increased. However, due to strict area/power budgets and presence of poor data-locality workloads, blindly scaling cache capacity is both infeasible and ineffective. Cache bypassing is a promising technique to increase effective cache capacity without incurring power/area costs of a larger sized cache. However, injudicio...

متن کامل

A Dueling Segmented LRU Replacement Algorithm with Adaptive Bypassing

In this paper we present a high performance cache replacement algorithm called Dueling Segmented LRU replacement algorithm with adaptive Bypassing (DSB). The base algorithm is Segmented LRU (SLRU) replacement algorithm originally proposed for disk cache management. We introduce three enhancements to the base SLRU algorithm. First, a newly allocated line could be randomly promoted for better pro...

متن کامل

The Demand for a Sound Baseline in GPU Memory Architecture Research

Modern GPUs adopt massive multithreading and multi-level cache hierarchies to hide long operation latencies, especially off-chip memory access latencies. However, poor cache indexing and cache line allocation policy as well as a small number of miss-status handling registers (MSHRs) can exacerbate the problem of cache thrashing and cache-missrelated resource congestion. Besides, modulo address ...

متن کامل

Improving Multi-Application Concurrency Support Within the GPU Memory System

GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...

متن کامل